Rank | Count | Beginning |
---|---|---|
5839 | 3127 | El |
15343 | 2156 | La |
9265 | 1866 | En |
18556 | 829 | Los |
22974 | 687 | Por |
29212 | 515 | Y |
20637 | 480 | No |
24994 | 475 | Se |
11502 | 453 | Es |
5 | 441 | A |
22344 | 401 | Pero |
17325 | 391 | Las |
4436 | 384 | De |
21790 | 357 | Para |
12648 | 294 | Este |
28237 | 293 | Un |
25923 | 279 | Si |
3533 | 270 | Con |
27171 | 250 | También |
28224 | 250 | Una |
18285 | 247 | Lo |
12197 | 242 | Esta |
853 | 234 | Al |
26231 | 229 | Sin |
25189 | 214 | Según |
180 | 211 | Además, |
5034 | 189 | Desde |
3272 | 170 | Como |
14393 | 143 | Hay |
12982 | 129 | Esto |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV